Goto

Collaborating Authors

 training data attribution


Training Data Attribution for Image Generation using Ontology-Aligned Knowledge Graphs

Aivalis, Theodoros, Klampanos, Iraklis A., Troumpoukis, Antonis, Jose, Joemon M.

arXiv.org Artificial Intelligence

As generative models become powerful, concerns around transparency, accountability, and copyright violations have intensified. Understanding how specific training data contributes to a model's output is critical. We introduce a framework for interpreting generative outputs through the automatic construction of ontologyaligned knowledge graphs (KGs). While automatic KG construction from natural text has advanced, extracting structured and ontology-consistent representations from visual content remains challenging -- due to the richness and multi-object nature of images. Leveraging multimodal large language models (LLMs), our method extracts structured triples from images, aligned with a domain-specific ontology. By comparing the KGs of generated and training images, we can trace potential influences, enabling copyright analysis, dataset transparency, and interpretable AI. We validate our method through experiments on locally trained models via unlearning, and on large-scale models through a style-specific experiment. Our framework supports the development of AI systems that foster human collaboration, creativity and stimulate curiosity.


Low-Perplexity LLM-Generated Sequences and Where To Find Them

Wuhrmann, Arthur, Kucherenko, Anastasiia, Kucharavy, Andrei

arXiv.org Artificial Intelligence

As Large Language Models (LLMs) become increasingly widespread, understanding how specific training data shapes their outputs is crucial for transparency, accountability, privacy, and fairness. To explore how LLMs leverage and replicate their training data, we introduce a systematic approach centered on analyzing low-perplexity sequences - high-probability text spans generated by the model. Our pipeline reliably extracts such long sequences across diverse topics while avoiding degeneration, then traces them back to their sources in the training data. Surprisingly, we find that a substantial portion of these low-perplexity spans cannot be mapped to the corpus. For those that do match, we quantify the distribution of occurrences across source documents, highlighting the scope and nature of verbatim recall and paving a way toward better understanding of how LLMs training data impacts their behavior.


Training Data Attribution via Approximate Unrolling

Neural Information Processing Systems

Many training data attribution (TDA) methods aim to estimate how a model's behavior would change if one or more data points were removed from the training set. Methods based on implicit differentiation, such as influence functions, can be made computationally efficient, but fail to account for underspecification, the implicit bias of the optimization algorithm, or multi-stage training pipelines. By contrast, methods based on unrolling address these issues but face scalability challenges.


Enhancing Training Data Attribution for Large Language Models with Fitting Error Consideration

Wu, Kangxi, Pang, Liang, Shen, Huawei, Cheng, Xueqi

arXiv.org Artificial Intelligence

The black-box nature of large language models (LLMs) poses challenges in interpreting results, impacting issues such as data intellectual property protection and hallucination tracing. Training data attribution (TDA) methods are considered effective solutions to address these challenges. Most recent TDA methods rely on influence functions, assuming the model achieves minimized empirical risk. However, achieving this criterion is difficult, and sourcing accuracy can be compromised by fitting errors during model training. In this paper, we introduce a novel TDA method called Debias and Denoise Attribution (DDA), which enhances influence functions by addressing fitting errors. Specifically, the debias strategy seeks to improve the performance of influence functions by eliminating the knowledge bias present in the base model before fine-tuning, while the denoise strategy aims to reduce discrepancies in influence scores arising from varying degrees of fitting during the training process through smoothing techniques. Experimental results demonstrate that our method significantly outperforms existing approaches, achieving an averaged AUC of 91.64%. Moreover, DDA exhibits strong generality and scalability across various sources and different-scale models like LLaMA2, QWEN2, and Mistral.


Towards User-Focused Research in Training Data Attribution for Human-Centered Explainable AI

Nguyen, Elisa, Bertram, Johannes, Kortukov, Evgenii, Song, Jean Y., Oh, Seong Joon

arXiv.org Artificial Intelligence

While Explainable AI (XAI) aims to make AI understandable and useful to humans, it has been criticised for relying too much on formalism and solutionism, focusing more on mathematical soundness than user needs. We propose an alternative to this bottom-up approach inspired by design thinking: the XAI research community should adopt a top-down, user-focused perspective to ensure user relevance. We illustrate this with a relatively young subfield of XAI, Training Data Attribution (TDA). With the surge in TDA research and growing competition, the field risks repeating the same patterns of solutionism. We conducted a needfinding study with a diverse group of AI practitioners to identify potential user needs related to TDA. Through interviews (N=10) and a systematic survey (N=31), we uncovered new TDA tasks that are currently largely overlooked. We invite the TDA and XAI communities to consider these novel tasks and improve the user relevance of their research outcomes.


Training Data Attribution via Approximate Unrolled Differentiation

Bae, Juhan, Lin, Wu, Lorraine, Jonathan, Grosse, Roger

arXiv.org Artificial Intelligence

Many training data attribution (TDA) methods aim to estimate how a model's behavior would change if one or more data points were removed from the training set. Methods based on implicit differentiation, such as influence functions, can be made computationally efficient, but fail to account for underspecification, the implicit bias of the optimization algorithm, or multi-stage training pipelines. By contrast, methods based on unrolling address these issues but face scalability challenges.


LLM Attributor: Interactive Visual Attribution for LLM Generation

Lee, Seongmin, Wang, Zijie J., Chakravarthy, Aishwarya, Helbling, Alec, Peng, ShengYun, Phute, Mansi, Chau, Duen Horng, Kahng, Minsuk

arXiv.org Artificial Intelligence

While large language models (LLMs) have shown remarkable capability to generate convincing text across diverse domains, concerns around its potential risks have highlighted the importance of understanding the rationale behind text generation. We present LLM Attributor, a Python library that provides interactive visualizations for training data attribution of an LLM's text generation. Our library offers a new way to quickly attribute an LLM's text generation to training data points to inspect model behaviors, enhance its trustworthiness, and compare model-generated text with user-provided text. We describe the visual and interactive design of our tool and highlight usage scenarios for LLaMA2 models fine-tuned with two different datasets: online articles about recent disasters and finance-related question-answer pairs. Thanks to LLM Attributor's broad support for computational notebooks, users can easily integrate it into their workflow to interactively visualize attributions of their models. For easier access and extensibility, we open-source LLM Attributor at https://github.com/poloclub/ LLM-Attribution. The video demo is available at https://youtu.be/mIG2MDQKQxM.


Exploring Musical Roots: Applying Audio Embeddings to Empower Influence Attribution for a Generative Music Model

Barnett, Julia, Garcia, Hugo Flores, Pardo, Bryan

arXiv.org Artificial Intelligence

With today's models there is an opaque nature to the generation process--it is never clear to the end user what data influences and shapes their newly crafted essay from ChatGPT [39], digitized surrealist art from DALLE-2 [42], or soulful jazz in the style of Rihanna from MusicLM [1]. Even further, due to the vast amounts of data they were trained on, it is usually not even clear when these models are "creating" near replicas of existing items from their training data. For users of generative models to be informed and responsible creators, there needs to be a mechanism that provides information about works in the model's training data that were highly influential upon the generated output, or directly copied by the model. This would allow the user to both cite existing work and learn about the influences of their generated output. We assume a model-generated product that is a copy or near-copy of a work in the model's training set indicates the model was influenced by that work. To develop methods to automatically detect the influences upon model-generated products it is, therefore, essential to develop good measures of similarity between works. In text, it is straightforward to detect when language models copy strings of text verbatim, given access to the training data. There is a growing body of work quantifying the degree to which these large language models memorize training data [10, 12, 23]. In the image space, it is more complex due to the high-resolution multi-pixel outputs of models, but work is being done to detect "approximate memorization" by finding highly similar images from the training data


Exploring Practitioner Perspectives On Training Data Attribution Explanations

Nguyen, Elisa, Kortukov, Evgenii, Song, Jean Y., Oh, Seong Joon

arXiv.org Artificial Intelligence

Explainable AI (XAI) aims to provide insight into opaque model reasoning to humans and as such is an interdisciplinary field by nature. In this paper, we interviewed 10 practitioners to understand the possible usability of training data attribution (TDA) explanations and to explore the design space of such an approach. We confirmed that training data quality is often the most important factor for high model performance in practice and model developers mainly rely on their own experience to curate data. End-users expect explanations to enhance their interaction with the model and do not necessarily prioritise but are open to training data as a means of explanation. Within our participants, we found that TDA explanations are not well-known and therefore not used. We urge the community to focus on the utility of TDA techniques from the human-machine collaboration perspective and broaden the TDA evaluation to reflect common use cases in practice.


Training Data Attribution for Diffusion Models

Dai, Zheng, Gifford, David K

arXiv.org Artificial Intelligence

Diffusion models have become increasingly popular for synthesizing high-quality samples based on training datasets. However, given the oftentimes enormous sizes of the training datasets, it is difficult to assess how training data impact the samples produced by a trained diffusion model. The difficulty of relating diffusion model inputs and outputs poses significant challenges to model explainability and training data attribution. Here we propose a novel solution that reveals how training data influence the output of diffusion models through the use of ensembles. In our approach individual models in an encoded ensemble are trained on carefully engineered splits of the overall training data to permit the identification of influential training examples. The resulting model ensembles enable efficient ablation of training data influence, allowing us to assess the impact of training data on model outputs. We demonstrate the viability of these ensembles as generative models and the validity of our approach to assessing influence.